On the Role of Sparsity in Compressed Sensing and 

Random Matrix Theory 
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Abstract — We discuss applications of some concepts of Com- 
pressed Sensing in the recent work on invertibility of random 
matrices due to Rudelson and the author. We sketch an argument 
leading to the optimal bound f^JV -1 / 2 ) on the median of the 
smallest singular value of an N x N matrix with random 
independent entries. We highlight the parts of the argument 
where sparsity ideas played a key role. 

I. Introduction 

A concept that underlies the many recent developments in 
the area of Compressed Sensing is sparsity. Much earlier, 
sparsity has been used in a similar way (but often implicitly) 
in theoretical mathematics, and most notably in Geometric 
Functional Analysis. Recently the understanding of the role of 
sparsity led to formalizing some of the connections between 
the statements in those areas, leading to a new interplay 
between "pure" and "applied" mathematics. 

While applications of "pure" mathematics to Compressed 
Sensing are expected and indeed quite common, the reverse 
direction - from Compressed Sensing to mathematics - is 
still rarely seen. This paper discusses one such application 
to the problem of invertibility of random matrices, which has 
been addressed in particular in the papers of Rudelson and 
the author J6), Q and ifTTI . Since we would like to focus 
here on techniques rather than results, we will often make 
oversimplifying assumptions and state weaker forms of the 
results. For the same reason, we discuss very little of history 
of these results and related work. The interested reader is 
encouraged to look at the original papers cited above for the 
statements of complete results, and for bibliography. 

We will denote positive absolute constants by C, c, C±, . . .; 
their values may change from line to line. 

II. Sparsity as entropy control 

One normally thinks of sparsity as a way to represent objects 
(vectors or functions) in a certain basis in an economical way 
- such that only a small number of basis elements can be 
used to accurately represent each object. In this discussion, 
we shall identify the basis with the canonical basis of R w , 
and our objects will be vectors in l w . We then say that a 
vector in M is s-sparse if it has few non-zero coordinates: 

|supp(x)| < s < N. 



We shall denote the set of all such vectors in 1^ by 
Sparse(N, s). This set clearly consists of the union of all s- 
dimensional subspaces of M. N , 

An efficient way to use sparsity is through control of the 
metric entropy of the space Sparse(N, s). Recall that, given a 
subset S of a metric space and a number e > 0, the covering 
number N(S, e) is the smallest cardinality of an e-net of S, i.e. 
the smallest number of e-balls centered at points in S needed 
to cover S. The logarithm of the covering number is often 
called metric entropy. 

A simple argument based on comparison of volumes leads 
to an exponential bound of the metric entropy of many natural 
subsets of K , and in particular of the Euclidean sphere S , 
see e.g. Lemma 9.5 in 0. This bound for the interesting range 
e £ (0, 1) reads as 

N(S N -\e) <(3/e) N . (1) 
This bound improves significantly for the set of sparse vectors. 



Since there are 
vector, we have 



ways to choose the support of a sparse 



N(Sparse{N, s) n S N ~\e) 



< 



N 



N-l 



0- 



Using (Q3 along with the bound ( ) < (eN/s) s valid for 
1 < s < N/2, which follows from Stirling's formula, we 
conclude with 



N(Sparse(N,s) D S 



N-l 



< (CN/s) 



(2) 



Comparing this with (HJ, we see that sparse vectors enjoy 
significantly smaller entropy than the whole sphere - the 
covering number is essentially exponential in the sparsity s 
rather than the dimension N. This advantage is crucially used 
in many arguments, such as in the following one. 

In Compressed Sensing, a basic quality of matrices that 
guarantees their good performance as measurement operators 
is the Restricted Isometry Condition. An nx N matrix A with 
n < N is said to satisfy the Restricted Isometry Condition 
(RIC) if A acts as an approximate isometry when restricted to 
the set of sparse vectors. Formally, for every integer s < N 
we define the RIC constant 5 S of the matrix A as the minimal 
number that satisfies the two-sided inequality 



(l-S)\\x\\l<\\Ax\\l<(l + S)\\x\\ 



(3) 



for all x G Sparse(N, s). Candes and Tao Q have shown 
(with constant improved in JH) that, given a matrix A with 
82s < V% — 1, one can exactly recover every s-sparse vector x 
from its "measurement vector" y = Ax by solving the convex 
optimization problem 

min llxlli subject to Ax = y. 

While it is difficult to explicitly construct matrices with 
good dimensions and RIC constants, random constructions 
are abundant in the literature (see Section V in [3]). Here 
we sketch the known argument for Gaussian matrices, which 
will highlight sparsity as entropy control, and will lead to 
our discussion of more difficult questions in Random Matrix 
Theory. 

Proposition 1 (Gaussian matrices): Let A be an n x N 
matrix whose entries are independent standard normal random 
variables. Let 1 < s < N and 8 > 0. If 

n > C(8)s\og(N/s) 

then, with high probability, the matrix A = -^A satisfies RIC 
with constant 8 S < 8. Here C(S) > only depends on 8. 

Proof: (Sketch) An approximation argument shows that 
it is enough to check (0 for all x in any fixed 8 -net of 
Sparse(N, s) n S 1 . So we choose such a net Af of car- 
dinality controlled as in (O, and we fix a vector x £ Af. Due 
to independence of the rows of A and the rotation invariance 
of the normal distribution, the random variable ||Ae||| is 
distributed identically with \ 2 : = 2~2i=i 9i> where g% are 
independent standard normal random variables. By the known 
concentration properties of the \ 2 distribution, or alternatively 
by the standard exponential concentration inequalities, one has 

a 

{l-8)n<Y,9 2 r < (l + *)n 

z=l 

with probability at least 1 — e~ c ^ n . In other words, with this 
probability, the Restricted Isometry Condition (0 holds for a 
fixed vector x £ Af. Taking the union bound and using the 
bound d2J on the cardinality of the net, we see that © holds 
for all vectors x E Af with probability at least 

1 - \Af\e~ c ( 5)n > 1 - (CN/s) s e- c{8)n . 

By the condition we made on the dimensions, the proof is 
complete. ■ 
The argument above can be easily generalized to distribu- 
tions other than normal by using standard exponential concen- 
tration inequalities; suitable moment bounds (subgaussian) are 
sufficient for this purpose. 

III. INVERTIBILITY OF RANDOM MATRICES 

One can view the Restricted Isometry Condition (f3j) as 
the condition that all submatrices of A with a given number 
of columns are well conditioned. The question of how well 
conditioned random matrices are goes back to at least Von 
Neumann and his collaborators, in connection with their work 



on large matrix inversion. Some history of the work on this 
problem is described in [6| and [7|, and some new results 
appeared since then, see iflOl . Here we shall focus on the 
original prediction going back to Von Neumann and his group 

- that the smallest singular value sm{A) of an N x N 
matrix with random independent centered entries is typically 
of order N^ 1 / 2 . Coupled with the known estimate on the 
largest singular value Es\(A) < N 1 / 2 (valid under suitable 
moment assumptions), the prediction implies that the condition 
number n(A) — si(A)sn(A) — O(N), i.e. is typically linear 
in the dimension. 

This prediction was verified for Gaussian matrices in |4), 
||9l using the explicit formula for the joint density of their 
eigenvalues, and was first proved for general random matrices 
in (6) under some mild moment assumptions. Ideas based on 
sparsity play an important role in (6). We will discuss this 
role the in the rest of the paper, and sketch the proof of the 
prediction above: 

Theorem 2 Let A be an TV x N matrix whose entries 

are independent identically distributed random variables with 
mean zero, unit variance, and fourth moment bounded by a 
constant. Then the median of sn(A) is bounded below by 

cN- 1 / 2 . 

Note that the result is sharp - it was proved in [8| that the 
median of sn(A) is bounded above by CN~ X / 2 . 

IV. INVERTIBILITY ON SPARSE VECTORS 

Our plan is to first prove Theorem [2] for Gaussian matrices 
A (whose all entries are standard normal random variables), 
and then to indicate how to modify the proof for general 
distributions. 

The smallest singular value has the following convenient 
expression: 

sn(A) = min IIArlU. 

Our goal is then to bound || Ar||2 below uniformly for all unit 
vectors x. 

We already know how to achieve this goal for all sparse 
vectors x. Indeed, by Proposition Q] the Gaussian matrices 
satisfy the Restricted Isometry Property. If we choose n — N 
and s — cN with sufficiently small absolute constant c > 0, 
Proposition Q] shows that, with high probability, 

min WAxh > cN 1/2 . 

xGSparse(N,cN)nS N - 1 

Note that this bound is much better than we need in Theorem|2] 

- we would be happy with cTV -1 / 2 in the right hand side. 
Now we need to handle the non-sparse vectors. 

V. INVERTIBILITY ON SPREAD VECTORS 

Our success with sparse vectors is due the fact that there 
are "not too many" of them. As we have seen by comparing 
(03 to (HJ, the metric entropy of the set of sparse vectors is 
much smaller than that of all vectors. Such a nice entropy 
control allowed us to handle all sparse vectors by taking a 



union bound (in the proof of Proposition [U without paying 
too much price in the probability estimates. 

Repeating a similar argument for non-sparse vectors is 
hopeless, as they lack a nice entropy control. Instead, we 
could first try to identify the class of vectors which is entirely 
opposite to the sparse vectors, and try to handle this class. 
These are spread vectors - those vectors in S N ^ X whose 
all coordinates have the same order N^ 1 / 2 . An advantage of 
spread vectors over sparse ones is that we know the magnitude 
of all their coefficients. So we develop the following geometric 
argument to prove the invertibility on the set of spread vectors. 

Let us begin with a qualitative argument. Suppose the matrix 
A performs extremely poor, and we have sn (A) = 0; in other 
words, A is a singular matrix. Therefore one of its columns 
Ak of A lies in the span H k = span(^4i)i^fc of the others. 

This simple observation can be made into a quantitative 
argument, which will work very well with the spread vectors. 
Suppose x = (xi, . . . ,xn) € R w is a spread vector. Then, 
for every k = 1 , . . . , N, we have 

JV 

\\Ax\\ 2 >dist{Ax,H k ) = dist(y2x i A i ,H k J (4) 

i=l 

= d\at(x k A k ,Hk) = \x k \ ■ &\st(Xk,Hk) 
> cN- 1 / 2 dist(X fe ,#fc). 

Since the left hand side does not depend on x, we have proved 
in particular (for k = 1) that 

min ||Ac|| 2 > cN~ 1/2 dist(X 1 ,H 1 ). 

Spread x 

It remains to estimate the distance between the random 
vector X\ and the independent random hyperplane H\. Since 
X\ is a Gaussian vector, it is easy to check (using the rotation 
invariance of the Gaussian disstibution) that dist (X\ , Hi ) is 
distributed identically with the absolute value of a standard 
normal random variable g. But the density of g is bounded by 
the absolute constant (27r) -1 / 2 , which makes 

dist(Xi,fli) = \g\ =fi(l) 

with arbitrarily high constant probability (say, 0.999). 

We have thus shown that, with arbitrarily high constant 
probability, 

min \\Ax\\ 2 > cN~ 1/2 . 

Spread x 

This is a desired uniform bound for the spread vectors. 

VI. Bridging sparse and spread vectors 

There are of course many vectors that are neither sparse nor 
spread, but it will now be relatively easy to bridge these two 
classes. 

Consider all vectors in S 1 ^^ 1 that are within a small 
absolute constant distance d > from the set of sparse 
vectors Sparse(N, cN) n S N_1 . We shall call such vectors 
compressible, and the rest of the vectors on the sphere are 
incompressible. The intuition, which again is coming from 



sparse recovery, suggests that compressible vectors should be- 
have similarly to sparse vectors, while incompressible vectors 
should be similar to spread vectors. 

Indeed, a trivial approximation argument extends our in- 
vertibility bound from sparse to compressible vectors (one just 
need to approximate a compressible vector by a sparse one and 
use that the error of this approximation d can only blow up 
by a factor \\A\\ = 0(N 1 ^ 2 )). So we have the desired bound 

min ||Ac|| 2 > cN 1/2 . 

Compressible x 

For incompressible vectors, instead of an approximation 
argument (which won't work) one makes the following simple 
observation: every incompressible vector has ft(N) coordi- 
nates of magnitude fl(N~ 1 ^ 2 ). This is a way how incom- 
pressible vectors are similar to spread ones. 

To complete the proof for incompressible vectors, we again 
use the geometric argument, but stop just before the last 
estimate in ©: 

\\Ax\\ 2 > max|x fc | • dist(X fc , H k ). 

k 

As we already know, for each k — 1,...,N, the distance 
satisfies dist(X k , H k ) = ^(1) with arbitrarily large prob- 
ability. Therefore, still with high probability, most of these 
distances (arbitrarily high constant proportion of them) are 
f2(l). On the other hand, we also know that some fixed 
proportion of the coordinates x k are of magnitude Q(N^ 1 / 2 ). 
Therefore, intersecting these two events, we see that for 
any incompressible vector x there exists a coordinate k that 
satisfies both bounds. This implies that, with high probability, 

min \\Ax\\ 2 > cN~ 1/2 . 

Incompressible x 

This is a desired bound which, along with the already proved 
estimate for compressible vectors, implies the final result: 

s N (A) = min ||Ad| 2 > cN~ 1/2 . 

VII. Extensions and further remarks 

The above argument generalizes from Gaussian to general 
distributions. There are two places where we used rotation 
invariance of the Gaussian distribution. One such place was 
the use of Proposition Q] in the treatment of sparse vectors. As 
we already mentioned, Proposition[T]can be easily extended to 
more general distributions using the standard large deviation 
inequalities. 

The other place where Gaussian distribution was used was 
in the argument for spread vectors. We argued there that the 
distance dist(Xi, Hi) between a random vector and a random 
independent hyperplane is f2(l) with arbitrarily high probabil- 
ity. For Gaussian distribution of the entries, this followed by 
a direct and easy computation. For more general distributions, 
this estimate is still true, but it requires more work. 

Let us condition on a realization of the hyperplane Hi, and 
let a £ M. N be a normal vector of Hi. Then clearly 

dist(Xi,i? 1 ) = (a,Xi). 



Writing this is coordinates for a 
(£,!,-■■: £n) we see that 



(ai, 



, ax) and X\ 



N 



distal, ffi) = \J2 a & — \ S \ 

»=1 

where S is clearly a sum of independent random variables. 

Our goal is to show that \S\ = f2(l) with high probability. 
One way to do this is to use a Central Limit Theorem (in 
the form of Berry-Esseen) to approximate S by a standard 
normal random variable g, for which we already have the 
desired result. For the The Central Limit Theorem to work, 
one obviously needs that many coordinates of a are not too 
small (for example, it will clearly not work if a is 1-sparse, 
as the sum S will consist of just one term). However, sparsity 
ideas can be again of help here. Running an argument similar 
to the one above for compressible vectors, one can show that, 
with high probability, the normal a to the random hyperplane 
Hi is incompressible. We then condition on such Hi, and the 
Central Limit Theorem works well: 



|P(|5| <e)~F(\g\ <e)UO(F 1 / 2 ). 



(5) 



This proves the desired bound dist(-Xi, Hi) = f2(l) with high 
probability 1 - 0{N^ 1 / 2 ). 

In 0, IfTTI . Theorem [2] was extended to rectangular ma- 
trices N x n, where N > n. Under the same assumptions, 
the median of the smallest singular value s n (A) of such 
random matrices is bounded below by c(\J~N — ^fn — 1), 
which is asymptotically optimal. Note that for square matrices, 
where N — n, this bound equals cN~ x / 2 , which agrees with 
Theorem |2] 

Under stronger moment assumptions on the entries (sub- 
gaussian), not only the median of the smallest singular value 
can be estimated, but also strong probability inequalities can 
be proved. For example, square matrices satisfy 



P{s N (A) < eN- 1 ' 2 ) <Ce + , 



-cN 



and rectangular matrices satisfy 



(s n {A) < e(VN~V^T)) < (Ce) 



N-n+1 , g-ciV 



for all e > 0. Proving such exponential inequalities is more 
difficult, because one can not afford a polynomial error in 
probability 0(N^ 1 ' 2 ) which one necessarily obtains when 
applying Central Limit Theorem in Q. Instead of using 
Central Limit Theorems, one develops a Littlewood-Offord 
Theory, whose probability estimates are fine-tuned to the 
additive structure of the coefficients of a. Since the sparsity 
does not play a key role in these arguments, we will not 
discuss this direction here. The interested reader is encouraged 
to consult the papers Q, IfTTI . 
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